home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
- SREGR - Simple Linear Regression
-
-
- 1. INTRODUCTION
-
- Linear regression is a method of finding the "best" straight
- line through a set of points. Each point is described by two
- values, and if there is a linear relationship between the values,
- the points will lie close to a straight line when plotted on a
- graph. For example, we might look for such a relationship
- between:
-
- - sales of the IBM PC and sales of the PC version of Lotus
- 1-2-3
-
- - sales of Lotus 1-2-3 and of books about 1-2-3
-
- - prime interest rate and business bankruptcies
-
- In each of the above examples, we might be interested in
- estimating the value of the second item (Y, the dependent var-
- iable) given a value for the first item (X, the independent
- variable). The SREGR worksheet will calculate the equation of the
- best-fit straight line through pairs of X-Y values, and will
- calculate estimates of Y, given X values, from the line. This
- worksheet is a subset of the LREGR worksheet (contained on the
- Volume 1B disk). It does not calculate any confidence limits, or
- the significance of the Student's t value for the regression. As
- a result, it can accept more pairs of input values for a
- particular main memory configuration, and it calculates faster
- than LREGR.
-
- The worksheet accepts pairs of values (labelled Y and X), and
- calculates the values of a and b in the equation Y = a + b X of
- the best-fit straight line. a is often called the intercept,
- because it is the point at which the line intercepts the Y-axis on
- a graph, and b is the slope.
-
- In addition to calculating the intercept and slope, the
- worksheet calculates two of the parameters a statistician uses to
- appraise the quality of the fit. These are:
-
- - the correlation coefficient (r), which measures the degree
- of association between X and Y. The value of r always lies
- between 1 and -1. A value of zero indicates no
- relationship; a value of +1 or -1 indicates a perfect
- relationship. The correlation coefficient is negative when
- larger values of X are associated with smaller values of Y.
-
- - the value of Student's t for the hypothesis that the slope
- (b) is zero (ie. changing the value of X doesn't have a
- significant effect on the value of Y). The larger the value
- of t, the less we are inclined to accept the hypothesis.
- Note the SREGR worksheet does not calculate the significance
- of this t-value in probability terms (but you can look it up
- in standard statistical tables).
-
-
- 08/21/84 SREGR - 1
-
-
-
-
-
-
-
-
- Statistical tables exist for the correlation coefficient and
- Student's t, which give the probability that such a value (or
- better) could be obtained purely by chance from random data.
-
- For the purpose of estimating the coefficients (a and b) of
- the regression equation, the worksheet uses all X-values for which
- Y-values are supplied. These X-Y pairs must be entered in
- consecutive rows, beginning at the top of the input area.
- However, you may enter more X-values below the ones paired with
- Y-values. The worksheet estimates the Y-values corresponding to
- all the X-values you enter. This is useful if you want to use the
- regression line as a forecasting pr estimating tool.
-
-
- 2. USING SREGR
-
- The main menu of SREGR offers you seven choices (plus Quit).
- They are as follows:
-
- Input
- This choice allows you to enter data to feed the
- calculations, and to change the limit on the number of data
- items you may input. There are two types of data: the raw
- X-Y pairs from which the regression line is derived, and
- additional X values for which Y estimates are calculated.
- Note that changing the maximum number of data items involves
- erasing all input data. Therefore, if you think you might
- need more space than the worksheet initially allows (300
- pairs), you should change the limit before you enter any
- data.
-
- Blank
- You should choose this option if you have finished with the
- data in the worksheet and want to start again with an empty
- worksheet. Note that this option wipes out all calculated
- results as well as the raw data. (You must confirm this
- choice to reduce the risk of accidental erasure.)
-
- Calc
- Make this choice when you want to calculate or recalculate
- the results. If you have changed data or entered more data
- since you last calculated, you must recalculate before
- displaying new results or graphs.
-
- Results
- This option guides you through the various results areas in
- the worksheet. The main results are all shown in the home
- screen. But there is a screen full of intermediate results
- (averages and sums of squares, for example). And alongside
- the columns in which you entered the raw data are calculated
- Y estimates. To print your results, select Print from the
- main menu (see below).
-
- Graph
- One of the clearest ways to show a linear regression is
-
-
- 08/21/84 SREGR - 2
-
-
-
-
-
-
-
- though a graph. This worksheet offers one graph - a simple
- one showing the original data points and the regression line
- through them.
-
- Print
- The print option allows you to send selected areas of the
- worksheet to your printer. You can select two different
- areas: the final and intermediate results areas (as one
- page), and the raw data and Y estimates.
-
- Agenda
- If you select this option (and confirm it when requested),
- the current worksheet is lost and replaced by the main
- agenda. Then you can load another worksheet from the disk.
- Note confirmation is required before this option proceeds,
- because it will overwrite the worksheet in memory.
-
- 2.1 Input Details
-
- The input phase allows you to enter data in a variety of
- ways. The input menu has five choices (plus Quit).
-
- The first and second choices (XYPairs and YXPairs) are very
- similar. If your source data is arranged as two columns of
- numbers, these choices will handle all pointer movement as you
- enter your data row-by-row. If the X column is on the left,
- select XYPairs; if the Y column is on the left, select YXPairs.
- Then enter your data one row at a time. Type the number on the
- left first, then the number on the right. After each number,
- press the ENTER key. Note that in the worksheet the Y column is
- always on the left. Selecting XYPairs or YXPairs merely
- determines which column receives the first number of each pair you
- type. These two options (and Est-X) are convenient if you want to
- use the numeric pad on your keyboard to enter the numbers, because
- they perform all the necessary cursor movement for you. You can
- press the Num Lock key to disable the cursor movement keys, and
- use the numeric pad for numbers. (Remember to press the Num Lock
- key again when you've finished entering your data, to re-activate
- the cursor movement keys.) When you have entered all your data,
- hold down the Ctrl key and press Break. This will stop the macro
- and exit to 1-2-3 READY mode. Then hold down the Alt key and
- press I to return to the input menu. If you choose either XYPairs
- or YXPairs when you have already entered some data, the new data
- will be added to the bottom of the existing pairs (but overlaying
- any unpaired X values for which Y estimates were required). To
- get rid of existing pairs, you must use the Blank option from the
- main menu.
-
- The third choice is Est-X. It allows you to add a column of
- unpaired X values below the existing data. The worksheet will
- calculate a Y estimate for each unpaired X value you enter. Type
- each value for X, and press the ENTER key. When you have entered
- all the values, hold down the Ctrl key and press Break. This will
- stop the macro and exit to 1-2-3 READY mode. Then hold down the
- Alt key and press I to return to the input menu.
-
-
-
- 08/21/84 SREGR - 3
-
-
-
-
-
-
-
- The fourth choice uses the standard 1-2-3 Range Input facil
- ity. In some respects this is convenient, because it confines the
- cell pointer to the unprotected cells in the X and Y input
- columns. And pressing the Esc key returns you directly to the
- input menu. However, you are responsible for all cursor movement
- within the input area. So you cannot easily use the numeric pad
- to enter your numbers with this option. Remember to enter Y
- values on the left (column B), and X values on the right (column
- C).
-
- The final choice (except Quit) in this menu is Number. It
- allows you to set the maximum number of input values. When the
- SREGR worksheet is first loaded, it is limited to 300 pairs of
- input values. If your machine has less than 256K of memory, you
- will not be able to enter even this many pairs without a 'Memory
- Full' error. However, if your machine has more memory, you may
- use this option to increase the amount of data you can input. You
- must confirm your intention to use this option, because it will
- erase any data in the input area.
-
- There will come a time when you have entered data into the
- worksheet, but you need to re-size the input area (for example,
- because you underestimated the amount of data). When you re-size
- the area, the worksheet will erase your input. But you can use
- the /File Xtract option of 1-2-3 to save the data you have entered
- and avoid re-typing it. Here's how.
-
- To save the input data, first exit to 1-2-3 READY mode,
- then press the Goto key (F5), type the word 'input', and
- press the ENTER key. You will find yourself at the top left
- corner of the input area. Then type the following 1-2-3
- command (initial letters only):
-
- /File Xtract Values
-
- 1-2-3 will prompt you for a file name. Choose a name for
- the data, type it, and press ENTER. If a file with this
- name already exists, 1-2-3 will ask you if it may be
- replaced (ie. if 1-2-3 may overwrite it). When you have
- chosen the file name, 1-2-3 asks you to specify the range to
- be saved. Press the keys Right End Down in sequence,
- followed by the ENTER key. Here, Right means the 6 key on
- the numeric pad (in cursor mode), and Down means the 2 key.
- 1-2-3 will then save your data on disk, and you may proceed
- to re-size the input area. (Hold down the Alt key and press
- M to return to the main menu.)
-
- When you have re-sized the input area, you can retrieve
- your data from the disk by exiting to 1-2-3 READY mode and
- moving to the top left corner of the input area (as
- described above). Then type the following 1-2-3 command
- (initial letters only):
-
- /File Combine Copy Entire-file
-
- 1-2-3 will present a menu of the files on disk, from which
-
-
- 08/21/84 SREGR - 4
-
-
-
-
-
-
-
- you should select the one you saved (or type its name) and
- press the ENTER key. 1-2-3 will then retrieve your data
- from the disk into the worksheet. Hold down the Alt key and
- press M to return to the main menu.
-
- There is one more option available to you for inputting data
- to this worksheet. That is to use the 1-2-3 commands. For
- example, you may already have the input data for your regression
- analysis stored in a worksheet on disk. If the data occupies two
- adjacent columns in the worksheet, with Y values in the left
- column, you can use 1-2-3's /File Combine Copy command to copy the
- data into the LREGR worksheet. To do this, simply choose Quit at
- the main menu, press the Goto key (F5), type the word 'input', and
- press ENTER. You will find yourself at the top left corner of the
- data input area. Now you can type /fcc and follow the remaining
- 1-2-3 instructions to copy the data from the worksheet on disk.
- When you have finished, hold down the Alt key and press I to
- return to the input menu. Note, do not use /Move to move data
- around in the worksheet. Moving data into a named range
- invalidates the range name along with any formulas that depend on
- it. You will probably destroy the worksheet's ability to
- calculate properly if you use /Move commands in it.
-
- 2.2 Graph Details
-
- The macro that displays the graph in this worksheet
- deliberately avoids setting many of the graph parameters. For
- example, the labels for the X- and Y- axes are initially set when
- the worksheet is loaded, and are not changed. If you wish to use
- your own labels, exit to 1-2-3 READY mode and change them. Then
- hold down the Alt key and press M to return to the main menu. The
- following is a list of the graph options that are set by the
- macro. Attempts to reset these options will be futile.
-
- - Graph subtitle
- - Graph type and ranges
- - Legends for ranges
- - Range formats
-
- To get a printed copy of any graph, first display it using
- the graph menu. The last-displayed graph is called the current
- graph. Then exit to 1-2-3 READY mode. Press the Graph key (F10),
- and verify the graph. If you wish, you may use the 1-2-3 /Graph
- commands to modify the graph before printing. Then type the
- following 1-2-3 command (initial letters only):
-
- /Graph Save
-
- 1-2-3 will ask you to specify a file name, and save the graph in
- that file. You may then use the PrintGraph program to print the
- graph. For instructions on how to use the PrintGraph program,
- read the PrintGraph section of the 1-2-3 manual.
-
- 2.3 Print Details
-
- The print macros assume you have a printer capable of 80-
-
-
- 08/21/84 SREGR - 5
-
-
-
-
-
-
-
- column printing. Before printing each option, the macros allow
- you to align the paper to the top of a new page. Press the ENTER
- key when you are ready for printing to begin. At the end, the
- macro will feed the paper to the top of a new page so that,
- normally, you won't have to keep re-aligning the paper.
-
- The final and intermediate results are printed as one,
- because they will comfortably fit onto a 66-line page. The third
- option can lead to several pages of output. The macro defines the
- top border of the page to be the rows of the worksheet that
- identify the columns being printed.
-
-
- 3. EXAMPLES
-
- 3.1 Radio or Television?
-
- In the U.K., one must buy a television licence before one can
- legally receive television programs. Prior to 1971, a radio
- licence was required to receive radio signals, unless one bought a
- television licence which covered both. With the rise of tele-
- vision in the home, one might expect a fall in the number of radio
- licences issued. Indeed, one might expect that for every
- television licence bought, one less radio licence would be sold,
- since most people owned radios before they bought televisions.
- The numbers of radio and television licences issued for the years
- 1949 to 1968 are shown in the following table:
-
- Radio and Television Licences
- Issued in the United Kingdom.
-
- YEAR RADIO TELEVISION
- (Millions)
- 1949 11.7 0.1
- 1950 12.0 0.3
- 1951 11.6 0.7
- 1952 11.2 1.4
- 1953 10.7 2.2
- 1954 10.2 3.2
- 1955 9.4 4.4
- 1956 8.6 5.6
- 1957 7.6 6.8
- 1958 6.6 8.0
- 1959 5.4 9.2
- 1960 4.5 10.4
- 1961 3.9 11.2
- 1962 3.5 11.9
- 1963 3.2 12.5
- 1964 3.0 13.0
- 1965 2.8 13.3
- 1966 2.6 13.7
- 1967 2.5 14.3
- 1968 2.5 15.0
-
-
- This table clearly shows the growth of television licences at
-
-
- 08/21/84 SREGR - 6
-
-
-
-
-
-
-
- the expense of radio licences. The data in this table is
- contained on your disk in a worksheet named TV. To use it to
- illustrate the regression analysis, perform the following steps:
-
- 1. Load the SREGR worksheet and when the main menu appears,
- Press Q to exit to 1-2-3 READY mode.
- 2. Press the Goto key (F5), then type the word 'input' and
- press the ENTER key.
- 3. Type the following 1-2-3 command (initial letters only):
- /File Combine Copy Entire-file TV
- and press the ENTER key. This will bring the saved data
- into the input area of your worksheet.
- 4. Hold down the Alt key and press M. This will return you to
- the main menu.
- 5. Choose the Calc option. When calculations are complete, you
- may use the results, graph, and print phases.
-
- The results show a highly significant correlation. The
- surprising part of the answer, however, is that the slope of the
- regression line is -1.425, and not, as one might expect, -1.
- Moreover the standard deviation of the slope - S(b) - is so small
- (0.03) that it is inconceivable that the real slope is -1 given
- the observations we have. This means the increase in the number
- of television licences sold exceeds the decrease in the number of
- radio licences by about 40%. Since anyone who bought or rented a
- television almost certainly used a radio, one must look for an
- explanation.
-
- A possible explanation for the difference is that radio
- licences were not treated seriously. It was common for people to
- use unlicenced radio receivers, and few were prosecuted. But when
- television licences were introduced, the need for a licence was
- publicised, and users of unlicenced televisions were prosecuted
- more actively. The 40% difference is likely due to people buying
- a television licence when before they did not bother to buy a
- radio licence. Another likely explanation is that the population
- of viewers was growing in total.
-
- The correlation coefficient (r) for this particular
- regression is -0.9962, which is very close to -1 (meaning perfect
- correlation). If you examine the graph, you will see that the
- points do indeed lie very close to a straight line.
-
- A note of warning is in order here. It can be very dangerous
- to assume that a cause-and-effect relationship exists between two
- factors, even when the fit is this good. One should take the
- results of any statistical analysis with a large grain of salt,
- unless there is a prior expectation of cause-and effect. In this
- particular case, we do have such expectation because owning a
- television licence removed the need to buy a radio licence.
- Measurements such as these that are taken over a period of time
- can be influenced by external factors which change over time.
- This can lead to an apparent relationship when in fact none
- exists. For example, television sales over some years were
- positively correlated with admissions to mental institutions.
- This statement could hardly justify an assertion that televisions
-
-
- 08/21/84 SREGR - 7
-
-
-
-
-
-
-
- cause mental illness! Both simply happened to be growing over the
- period the figures were observed.
-
- 3.2 Example 2 - Fictitious Data
-
- This example leads you through a complete analysis of the
- data given in the following table:
-
- Linear Regression
- Fictitious Data
- X value Y value
- 1 3
- 3 5
- 5 6
- 7 8
- 9 12
- 11 12
-
- Once you have loaded the SREGR worksheet, the following
- keystrokes will perform the analysis:
-
- Keystrokes Comments
-
- {ENTER} Press the ENTER key to select the Input choice in
- the main menu.
-
- X Choose the XYPairs option in the input menu,
- because this is how the source data is arranged.
- Note that the macro will place the cell pointer in
- the topmost cell of the X value input column.
-
- 1 {ENTER} This enters the first X value in the top cell.
- The macro will move the cell pointer one column to
- the left to receive the first Y value.
-
- 3 {ENTER} Enters the first Y value. The macro will now move
- the cell pointer to the right and down so you can
- enter the second X value.
-
- 3 {ENTER} Enter the remaining values.
- 5 {ENTER}
-
- 5 {ENTER}
- 6 {ENTER}
-
- 7 {ENTER}
- 8 {ENTER}
-
- 9 {ENTER}
- 12 {ENTER}
-
- 11 {ENTER}
- 12 {ENTER}
-
- {Ctrl} {Break} Having entered all the pairs of values, hold down
- the Ctrl key and press Break. This stops the
-
-
- 08/21/84 SREGR - 8
-
-
-
-
-
-
-
- macro, and returns you to 1-2-3 READY mode.
-
- {Alt} I Hold down the Alt key, and press I. This returns
- you to the input menu.
-
- E Select Est-X to obtain estimates of Y for
- specified X values. The macro positions the cell
- pointer at the first cell in the X input column
- below the values you have already input.
-
- 0 {ENTER} Enter the X values for which estimates are
- required.
- 2 {ENTER}
- 4 {ENTER}
- 6 {ENTER}
- 8 {ENTER}
- 10 {ENTER}
- 12 {ENTER}
-
- {Ctrl} {Break} Return to the input menu.
- {Alt} I
-
- Q Return to main menu.
-
- C Perform the calculations.
-
- At this point, all the results have been obtained, and you
- can display them by choosing Results in the main menu. Here's how
- (starting at the main menu):
-
- R Takes you to the Results menu from the main menu.
-
- {ENTER} Displays the final results on the 'home' screen.
- (Typically, you will already be on the home
- screen, and this option will have no visible
- effect.) On this screen, you can read the
- calculated regression equation, the correlation
- coefficient, and the Student's t value for testing
- the significance of the regression.
-
- I Displays the intermediate results. Here you can
- see the number of (paired) observations, and the
- mean of the (paired) X and Y values. The last
- line on this screen shows the standard deviation
- of the slope, which is used to derive the
- Student's t statistic for the significance test of
- the regression.
-
- R Displays the raw data. The display starts at the
- first screen-full of data, but you are presented
- with a menu that allows you to scroll up and down
- one screen-full at a time. Note the macro sets up
- a window so the column headings are always
- displayed above the data columns. You will not
- need to scroll for this small set of data.
-
-
-
- 08/21/84 SREGR - 9
-
-
-
-
-
-
-
- Q Returns you to the results menu.
-
- Q Returns you to the main menu from the results
- menu.
-
-
- You can display the graph provided by this worksheet as
- follows:
-
- G Takes you from the main menu to the graphics menu.
- Initially, the graphs are set up to display on a
- color monitor. If you have a monochrome monitor
- (with a graphics adaptor card), you should type
- the following three commands:
-
- O Selects the graphics options menu.
-
- B Tells 1-2-3 to plot graphs in monochrome (B&W).
-
- Q Returns you to the graphics menu.
-
- R Displays the graph showing the raw data points,
- and the regression line through them. From this
- graph, you can see how well (or not) the points
- follow the straight line. Press any key to return
- to the graphics menu.
-
- Q Returns you to the main menu.
-
- If you wish, you may Quit all menus and set up the graph
- title (not subtitle), and the X- and Y-axis legends to suit your
- needs. Then use Alt-M to return to the menus and re-display the
- graph. Since the graphics macro does not reset these parameters
- each time, they will appear on the graph as you have set them.
-
- To print the results, first make sure your printer is turned
- on and loaded with paper. Then, from the main menu, type the
- following:
-
- P To go to the print menu.
-
- R To print the final and intermediate results.
- Press ENTER when you are ready for printing to
- begin.
-
- D To print the raw data and Y estimates. Press
- ENTER when you are ready for printing to begin.
-
- Q To return to the main menu.
-
- Before starting each section of output, the macros pause for
- you to align the paper. After each section of output, they space
- to the top of the next page, so you should only have to align the
- paper once - before the first section.
-
-
-
-
- 08/21/84 SREGR - 10
-
-
-
-
-
-
-
- Menu Tree for SREGR Worksheet
-
- Input - Input X-Y data.
- . XYPairs - Input X-Y pairs of values (X first).
- . YXPairs - Input Y-X pairs of values (Y first).
- . Est-X - Enter X values to obtain Y estimates.
- . Range - Use 1-2-3 standard Range Input facility.
- . Number - Set maximum number of data pairs (erases data).
- . . No - Do not erase or resize data area.
- . . Yes - Erase and resize data area.
- . Quit - Return to main menu.
-
- Blank - Blank out data input area.
- . No - Do not blank out input area; return to main menu.
- . Yes - Blank out input area, then return to main menu.
-
- Calc - Calculate all results.
-
- Results - Display results on the screen.
- . Final - Show final results (regression equation, etc.).
- . Intermed - Show intermediate results (sums of squares, etc.).
- . Rawdata - Show raw input data and Y estimates.
- . . Down - Scroll down one screen.
- . . Up - Scroll up one screen.
- . . Quit - Return to results menu.
- . Quit - Return to main menu.
-
- Graph - Display graphs.
- . Rawdata - Display graph of input data and regression line.
- . Options - Select graph options.
- . . Color - Display graphs in color.
- . . B&W - Display graphs in monochrome.
- . . Quit - Return to graph menu.
- . View - Re-display the last-seen graph.
- . Quit - Return to main menu.
-
- Print - Send results to printer.
- . Results - Print final and intermediate results.
- . Data - Print raw data and Y estimates.
- . Quit - Return to main menu.
-
- Agenda - Return to worksheet agenda.
- . No - Do not leave this worksheet; return to main menu.
- . Yes - Clear this worksheet and load agenda.
-
- Quit - Exit to 1-2-3 READY mode.
-
-
-
-
-
-
-
-
-
-
-
-
- 08/21/84 SREGR - 11
-
-
-